A robust two-gene oscillator at the core of Ostreococcus tauri circadian clock 
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The microscopic green alga Ostreococcus tauri is rapidly emerging as a promising model organism 
in the green lineage. In particular, recent results by Corellou et al. [Plant Cell 21, 3436 (2009)] 
and Thommen et al. [PLoS Comput. Biol. 6, el000990 (2010)] strongly suggest that its circadian 
clock is a simplified version of Arabidopsis thaliana clock, and that it is architectured so as to be 
robust to natural daylight fluctuations. In this work, we analyze time series data from luminescent 
reporters for the two central clock genes TOC1 and CCA1 and correlate them with microarray data 
previously analyzed. Our mathematical analysis strongly supports both the existence of a simple 
two-gene oscillator at the core of Ostreococcus tauri clock and the fact that its dynamics is not 
affected by light in normal entrainment conditions, a signature of its robustness. 



In order to anticipate periodic en- 
vironmental changes induced by Earth 
rotation, many organisms have evolved 
a circadian clock, a genetic oscillator 
which generates biochemical rhythms 
with a period around 24 hours. Exact 
synchronization with the day/night cy- 
cle requires that one or more clock com- 
ponents sense daylight (for example, a 
protein degrades faster in the light). 
However, daylight intensity can highly 
fluctuate from day to day, or during the 
day, due to environmental factors such 
as sky cover. The question then arises 
as to how circadian clocks can keep 
time without being continously reset by 
the signal that should entrain them? 
A mathematical analysis of Ostreococ- 
cus tauri clock, whose molecular basis 
has been identified recently [1], has un- 
veiled a simple and elegant mechanism 
which exploits the dynamical proper- 
ties of the core clock oscillator to make 
it robust to daylight fluctuations [2|. 
When the clock is on time, coupling 
to light is activated precisely when the 
oscillator does not respond to external 
perturbations, making it blind to light 
and its fluctuations. If the clock has 
drifted and needs resetting, however, 
light affects the oscillator in a different 
part of its cycle, where it reacts so as 
to recover the entrainment phase. In 
this work, we provide strong evidence 
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for the presence of a robust two-gene 
oscillator at the core of Ostreococcus 
clock by showing that a minimal light- 
independent model can reproduce very 
accurately microarray data and lumi- 
nescent reporter data recorded in dif- 
ferent experiments. 

I. INTRODUCTION 

Biochemical oscillations are widespread biological phe- 
nomena involved in many important cellular processes 
such as signaling, development, motility or metabolism 
[3, H[ . Understanding how such a simple dynamical pat- 
tern has been implemented repeatedly to support a great 
diversity of biological functions is appealing for scientists 
seeking to uncover unifying principles Q. By identify- 
ing the molecular machinery behind cellular rhythms, 
molecular biology has first laid the groundwork for the 
development of strategies toward their comprehensive un- 
derstanding [6J. In a second stage, how a collective dy- 
namics can emerge in networks of interacting molecular 
actors and how it can be harnessed robustly has been 
under focus. Besides the design of synthetic genetic cir- 
cuits with specific oscillatory abilities @, H] , a common 
strategy to gain insight into this question has been to 
construct quantitative dynamical models constrained by 
experiments [H, 0-[ll| . Indeed, the nonlinear dynamical 
behaviors that underlie oscillations can only be fully cap- 
tured through a mathematical description. 

However, the ever-growing amount of experimental 
data obtained with genetic transformations and real-time 
monitoring presents extraordinary challenges to model- 
ers. How to design relevant mathematical models that 
not only reproduce the data but give insight into the ar- 
chitecture of a biological system, and avoid pitfalls such 
as overfitting which can ascribe biological meaning to ex- 
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perimental artifacts? As we hope to illustrate here, it is 
important to combine careful data analysis and minimal 
modeling, and to check the consistence of the different 
data sets at all stages of model building. 

Circadian clocks are systems of choice for quantitative 
studies of the function and design of biochemical oscil- 
lators [IH • They operate in many species to keep track 
of the most regular environmental constraint, the alter- 
nation of daylight and darkness caused by Earth rota- 
tion, so as to finely control the cellular physiology ac- 
cordingly [1 31 ] - This basic function is vital in many or- 
ganisms such as plants, which need to timely coordinate 
their photosynthesis, and thus their growth and division, 
to daily changes in light intensity [lJ-Qjl- However, a 
precise entrainment of circadian rhythms to the diurnal 
cycle is potentially challenged by many sources of vari- 
ability including molecular fluctuations [It], HI] and tem- 
perature variations [III [2(| , but also fluctuations of day- 
light intensity from day to day or during the day, due to 
environmental factors such as sky cover 0, [2l| - [23j . To 
account for the remarkable ability of circadian oscillators 
to run autonomously and to be precisely and robustly 
entrained, experimental efforts aimed at unraveling their 
complex architecture fT\ 2li have motivated studies try- 
ing to adjust mathematical models to experimental data 
pa J27l - l33j . an approach that is increasingly necessary as 
models become more complex, featuring generally several 
feedback loops. 

In this paper, we use such a quantitative modeling 
approach to investigate the dynamics of the circadian 
rhythms of the smallest free-living eukaryote known to 
date, Ostreococcus tauri. This microscopic green alga dis- 
plays a very simple cellular organization and a small and 
compact genome (33 - (36j . Very recently, the molecular 
basis of its circadian clock has been extensively charac- 
terized by Corellou et ai, who carried out an extensive 
work of genetic transformation, leading to transcriptional 
and translational fusion lines allowing one to monitor 
transcriptional activity and protein dynamics in living 
cells [H, Their results point to a core archictec- 

ture comprising two genes, similar to Arabidopsis cen- 
tral clock genes TO CI and CCAl [H]. These two genes 
display rhythmic expression both under light/dark alter- 
nation and in constant light conditions. Thus, the uni- 
cellular green alga Ostreococcus tauri has emerged as a 
promising organism model to study the circadian rhythm 
in single photosynthetic eukaryotic cell combining exper- 
imental and modeling approaches. 

The goal of this modeling study is to check care- 
fully whether experimental data obtained through var- 
ious channels support the hypothesis that the circadian 
clock of Ostreococcus Tauri contains a simple two-gene 
transcriptional loop serving as a core oscillator [J 0] , 
and that the dynamics of this oscillator is not affected 
by light in normal entrainment conditions 0- An im- 
portant point of the present work is the simultaneous 
adjustment of microarray data and luminescent reporter 
data recorded in two different experiments [l], [l5j under 



different conditions. While this may not be sensible for 
a general system whose parameters will typically change 
with experimental conditions, it is expected here that the 
core clock oscillator is robust to environmental changes. 
As advocated by Thommen et al. |2}, it should thus de- 
liver similar signals in different experiments carried out 
with the same photoperiod. Therefore, being able to re- 
produce with the same model signals recorded with dif- 
ferent techniques in different experiments provides strong 
evidence that the core Ostreococcus clock oscillator is in- 
deed robust. 

Before adjusting a minimal model to experimental 
data, a careful data analysis of both microarray and lu- 
minescence time series has been performed. It allowed 
us to detect and remove an experimental artifact that 
would otherwise have prevented an optimal adjustment 
and thus would have falsely called for a more complex 
model. The combined use of microarray data and lu- 
minescence reporter data provided a unique opportunity 
to calibrate the latter with respect to the former, given 
that very little is known on luciferase dynamics in Ostre- 
ococcus. This is all the more important as an extensive 
collection of experimental data has been acquired with 
these reporters and will form the basis of future quanti- 
tative studies of Ostreococcus clock. 

We not only found that a simple two-gene transcrip- 
tional feedback loop model reproduces perfectly the 
CCA1 and TO CI transcript and protein profiles ob- 
tained from data analysis, but that it does so with no 
model parameter depending on light intensity whereas 
the data have been recorded under light/dark alterna- 
tion. This confirms our previous observation based only 
on microarray data of limited time resolution |2J. As 
we have proposed previously, this counterintuitive find- 
ing indicates that the coupling to light does not influence 
the oscillator in normal entrainment conditions, shielding 
it from daylight fluctuations 2]. Besides confirming the 
two-gene loop hypothesis and the oscillator robustness, 
this work also identifies unambiguously the mechanistic 
origin of oscillation in the transcriptional negative feed- 
back loop involving CCAl and TOC1, delayed by the 
saturated degradation of CCAl mRNA and TOC1 pro- 
tein. 



II. DATA ANALYSIS 

In this section, we first present the experimental 
data and explain how we determine mRNA and pro- 
tein concentrations from them. We then discuss the sim- 
plest model of the TOC1-CCA1 negative feedback loop, 
which consists of four differential equations describing the 
time evolution of the TO CI and CCAl mRNA and pro- 
tein concentrations. To match the numerical time profiles 
generated by this model, we need to reconstruct target 
profiles from the experimental data. RNA profiles will be 
interpolated from microarray data whereas protein pro- 
files will be extracted from luminescence time series in the 
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form of a Fourier series describing the dynamics at long 
time scales. Both types of profile will be characterized by 
their times of passage at 20% and 80% of the oscillation 
amplitude, which provide a uniform characterization of 
the two types of data considered here. Model adjustment 
will be carried out by minimizing discrepancies between 
the experimental and numerical passage times. 



A. Experimental data 

The microarray data used here come from the study 
in Ref. 15|. They were recorded at three-hour time in- 
tervals under 12:12 light/dark alternation. The mRNA 
time profiles that will serve as targets for model adjust- 
ment are the same as in our previous study 0, where 
we showed that they could be accurately approached by 
solutions of a simple set of differential equations, an in- 
dication of the high quality of these data. The fact that 
microarray data reflect mRNA level without ambiguity 
was also checked by quantitative RT-PCR. 

The luminescent reporter data used here are those pre- 
sented by Corellou et al. in 1] to provide evidence of the 
central role of the TOC1 and CCA1 genes in Ostreococ- 
cus clock. They had been recorded at one-hour intervals 
under a 12:12 light/dark cycle from two types of trans- 
genic cell lines. 

In transcriptional fusion lines pTOClduc and 
pCCAlduc, the sequence inserted into the genome con- 
sists of the promoter of one of the two clock genes fused 
to the coding sequence of firefly luciferase [l|, so that 
TOC1 or CCA1 transcriptional activity drives luciferase 
expression. Luciferase catalyzes the transformation of 
the substrate luciferin into oxyluciferin, in which one 
photon is emitted [39j . The luminescence signal provides 
information about the quantity of luciferase synthesized 
and thus on the transcriptional activity of the promoters. 

In translational fusion lines TOClduc and CCAlduc, 
both the coding sequences of one of the clock proteins 
(TOC1 or CCA1) and of luciferase are fused to the pro- 
moter [lj, so that a fusion protein combining the clock 
protein with luciferase is synthesized by the additional 
gene with a trancriptional activity similar to the origi- 
nal clock gene. In this case, the luminescence signal also 
provides information about the protein dynamics, and 
in particular on its degradation kinetics. It is usually 
assumed that luciferase remains complexed and inactive 
after the reaction and does not recover its activity before 
being degraded. Thus two limiting cases can be consid- 
ered depending on whether the luciferase reaction time 
(defined as the average time from synthesis to photon 
emission) is much shorter or much longer that the pro- 
tein lifetime. 

In the former case, luciferase reacts immediately after 
being synthesized. Since only freshly made proteins then 
contribute to the luminescence signal, the latter is ob- 
viously proportional to the protein synthesis rate, hence 
to cytoplasmic fusion mRNA concentration. If we as- 



sume for simplicity that the kinetic constants (synthesis 
and degradation rates) of the fusion proteins TOCduc 
and CCAlduc and of their mRNA are similar to that of 
their native counterparts, then the concentrations of the 
former are proportional to the concentrations of the lat- 
ter. In this case, the luminescence signal tracks mRNA 
concentration. 

In the opposite case where the probability of photon 
emission by luciferase is very low, and with the same 
assumption of identical constants for fusion and native 
molecular actors, it is easily seen that the luminescence 
signal is proportional to protein concentration. In this 
limit, indeed, the photon emission probability is constant 
throughout protein lifetime, thus luminescence intensity 
is proportional to the number of TOClduc or CCAlduc 
fusion proteins, which is in turn proportional to the num- 
ber of native clock proteins. In this case the luminescence 
signal can be used as an indicator of protein concentra- 
tion. In the general case, it should be intermediate be- 
tween the RNA and protein time courses. 

A time lag of at least two hours between the maxima 
of RNA concentration as indicated by microarray data 
and the maxima of translational fusion lines can be ob- 
served, which indicates that photon emission probability 
is low and that luciferase lifetime is large compared to 
other protein lifetimes. This is consistent with recent 
experiments in which translation was blocked in Ostreo- 
coccus cell cultures with emetine dihydrochloride and 
time evolution of TOCduc luminescence was monitored. 
It was observed that the signal would slowly decay over 
more than 12 hours, implying that the luciferase reac- 
tion time is at least of this order. Thus, we will use in 
the following translational fusion line signals as indica- 
tors of protein concentrations, which allows us to keep 
our mathematical model as simple as possible. 



B. Model of the TOC1-CCA1 oscillator used for 
data adjustment 

The minimal model for the TOC1-CCA1 transcrip- 
tional feedback loop, where TOC1 activates CCA1 and 
CCA1 represses TOC1, consists of the following four or- 
dinary differential equations: 



Mt = Mt 



K Mt M t . 
%r- — , r ( la) 



P T = f3 T M T - 5 P . 
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Eqs. (JTJ describe the time evolution of mRNA concentra- 
tions Mq and Mt and protein concentrations Pc and Pt 



4 



for the CCAl and T0C1 genes, respectively, as they re- 
sult from mRNA synthesis regulated by the other protein, 
translation and enzymatic degradation. TOC1 transcrip- 
tion rate varies between \xt at infinite CCAl concentra- 
tion and ht + At at zero CCAl concentration accord- 
ing to the usual gene regulation function with threshold 
Pco and cooperativity nc- Similarly, CCAl transcrip- 
tion rate is \xc (resp., /ic + ^c) at zero (resp., infinite) 
TOC1 concentration, with threshold Pto and coopera- 
tivity tit- TOC1 and CCAl translation rates are (3t 
and (3c, respectively. For each species X, the Michaelis- 
Menten degradation term is written so that 6x is the 
low-concentration degradation rate and Kx is the satu- 
ration threshold. 



A more detailed model would take into account com- 
partmcntalization, with each actor having separate nu- 
clear and cytoplasmic concentrations, as well as the fact 
that the luminescence signal is linked to a third gene arti- 
ficially inserted in the genome, with the kinetic constants 
of its mRNA and protein possibly different from those of 
its native counterpart. With this model, it is the pro- 
file predicted for the total fusion protein concentration 
that would be adjusted to the experimental time series 
rather than the native protein profile. However, the im- 
portant point is that such a biochemically detailed model 
taking into account all native and inserted molecular ac- 
tors reduces to Eqs. ([1]) in the limiting case where fusion 
proteins and mRNA have the same kinetic constants as 
the native molecules (e.g., TOClduc and TOC1 degrada- 
tion rates are equal) , luciferase reaction time is large and 
nucleocytoplasmic transport is fast. When the minimal 
model already adjusts the data with excellent accuracy, 
as will be the case here, there is no point in using the 
sophisticated model, which can only fit the data better. 
In fact, its higher flexibility could lead to overfit the data 
and ascribe incorrectly biological meaning to experimen- 
tal artifacts. 



As in 0, the free- running period (FRP) of the uncou- 
pled oscillator is chosen equal to 24 hours, which was 
the mean value observed in experiments [l[ . In this case, 
this oscillator is adjusted to experimental data without 
modulation since our goal in this work is to show that 
the average oscillation does not carry any signature of 
coupling to light. As a control, we will also consider ad- 
justment to oscillators of FRP 23.8 and 25 hours under 
day conditions. In these two cases, a small modulation 
of some parameters is required to achieve frequency lock- 
ing. To keep the FRP fixed during adjustment, we rescale 
kinetic parameters after each optimization step so that 
the period of the free-running limit cycle matches the de- 
sired value. This subsequently applies to the parameter 
set which the adjustment converges to. 



C. Reconstruction of target profiles and 
determination of passage times 

Experimental RNA target profiles were estimated from 
the same microarray data points as used in (2j. Due 
to the high quality of data, noise reduction was not re- 
quired. However, the relatively low temporal resolution 
(one sample every three hours) made it difficult to es- 
timate accurately passage times and positions of expres- 
sion peaks. Thus the profiles were obtained by interpolat- 
ing between data points. Time series in the logarithm of 
RNA concentration showed a very smooth behavior and 
were well approximated by simple cubic splines (Fig. [I}. 
These interpolating curves were thus considered to pro- 
vide an optimal approximation to the variations of the 
logarithm of mRNA concentration, and mRNA linear 
target curves were generated by exponentiation of the 
interpolating curves. 

That the largest data point is less than half of the max- 
imum of the reconstructed TOC1 mRNA profile should 
not be surprising. In fact, this profile is the most prob- 
able one given the data points if we assume sufficient 
smoothness, or equivalently that the same dynamics acts 
over the 24-hour cycle. Any other curve would imply 
fast, transient, processes not linked to the TOCl-CCAl 
loop. Since RNA concentration rises from almost zero 
at ZT6 (Zeitgeber Time 6, i.e., 6 hours after dawn) to 
almost the maximum level measured at ZT9 and decays 
from the same level at ZT12 to almost zero at ZT15, it is 
indeed obvious that the true maximum located between 
ZT9 and ZT12 must be much higher than the two largest 
data points. Incorrectly assuming that the largest data 
point is the profile maximum would in fact bias the anal- 
ysis. 

We checked that interpolating either the CCA 1 mRNA 
concentrations or their logarithms lead to two almost su- 
perimposing profile curves, showing that both sequences 
of values provide the same information. However the 
profile interpolated directly from TOC1 mRNA concen- 
trations rather than from their logarithms was highly 
nonphysical, with interpolated concentrations becoming 
clearly negative on each side of the expression peak. This 
is a direct consequence of the fact that only two data 
points are well above the zero level. Implicit in the in- 
terpolation procedure is the reconstruction of successive 
time derivatives of a function from its values at differ- 
ent points. The long sequence of almost zero data points 
makes the correspondence between function values and 
time derivatives nearly singular, so that the reconstruc- 
tion is highly unstable. During the time interval near 
zero, all time derivatives up to high order appear to be 
zero, then jump to high values at the beginning and the 
end of the expression peak. Since the interpolation error 
is proportional to the maximum value of the fourth time 
derivate in the interval, the resulting approximation is 
poor. 

The series of logarithms of TO CI mRNA concentra- 
tions does not have this problem and allows optimal re- 
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FIG. 1: Construction of RNA target profiles as functions of 
Zeitgeber Time (ZT) describing phases within the dark/light 
cycle, with time ZTO corresponds to dawn and time ZT12 to 
dusk. Interpolating curves going through data points (dots) 
are represented, (a) Interpolated curves from microarray data 
using cubic splines, (b) The exponentials of the interpolating 
curves were computed to obtain a smooth approximation of 
mRNA profiles. Large dots indicate passages through levels 
corresponding to 20% and 80% of maximum amplitude and 
serve as target points for adjustment. 



construction of the TO CI mRNA profile. The fact that 
the logarithm of a time series with long intervals near zero 
provides more information about the dynamics than the 
original time series has also been noted in the analysis 
of chaotic time series [10, El- Furthermore, it should be 
noted that many statistical analyses of microarray data 
use the logarithms of mRNA concentrations because they 
are more evenly distributed and provide more informa- 
tion. 

Compared to a more usual method based on least- 
square-fitting the data points (as used in [2]), this inter- 
polation procedure can only make adjustment more dif- 
ficult, since it constrains tightly the shape of the profile. 
However, it is fully consistent with the good agreement 
found with a low-dimensional ODE set in our previous 
study 0| . The passage times were then determined as 
the times at which the interpolated profile would cross 
the appropriate level (Fig. [JJ. 

Protein profiles were reconstructed from translational 
fusion luminescence signals used as indicators of protein 
concentrations, as discussed above. The reconstruction 
was more involved than for mRNA because these time 
series display significant amplitude variations from peak 
to peak across the experiment, which may be for instance 



linked to variations in the number of cells contributing 
to light emission or other experimental factors. 

The first 48 recording hours, considered as transient, 
were discarded and traces were then least-squares fit- 
ted by the product p(t)F(t) of a Fourier series F(t) — 
Sfc=o ak cos(2irkt/T + <fik), with T = 24 hours, by a low- 
order polynomial p(t) accounting for slowly varying ex- 
perimental conditions (Fig. [2]). The order of the poly- 
nomial p(t) was generally chosen to be 3 or 4, roughly 
equal to the number of periods in the fitted segment to 
avoid over-fitting. The Fourier series F(t) represents the 
average periodic biochemical oscillation associated with 
circadian oscillations while p(t) models a slowly varying 
gain (due for example to a variable number of cells con- 
tributing to light emission). 

Because we are interested in long lasting mechanisms 
sustaining the oscillation, the number N of harmonics of 
the Fourier series was fixed to 5 (an odd number, since 
the spectrum of the square wave forcing by light only 
contains odd harmonics). We also tried slightly higher 
numbers of harmonics, with little difference in the results. 
Fourier fitting naturally smoothes out acute responses at 
day/night and night/day transitions that can be seen in 
the raw signals and which may reflect rapid mechanisms 
involved in clock resetting. This separation of dynamical 
processes occurring on a 24-hour time scale on the one 
hand, and confined to short time intervals on the other 
hand, must be understood with reference to our previous 
finding that the average dynamics of the TOCl-CCAl 
oscillator is identical to that of a free-running oscillator, 
with coupling to light and resetting occuring transiently 
in specific time intervals [2j. Confirming this key obser- 
vation with a careful analysis of the luminescence time 
series is one of the main goals of this paper. It will be 
reached if we find that the average protein profile does 
not carry any signature of a coupling to light. 

The Fourier series obtained from the fits of all bi- 
oluminescence traces (possibly corresponding to differ- 
ent transgenic lines) recorded in the experiment were 
then averaged, and the average was used as surrogate 
for the protein temporal profile after suitable normaliza- 
tion. The resulting profile is shown in Fig. EJb) , along 
with individual luminescence traces from different wells, 
renormalized using the slowly varying polynomial func- 
tion p(t). The variability observed is partially due to the 
fact that a few different transgenic lines, corresponding 
to different insertions in the genome, were used. The 
times of passage at 20% and 80% of the amplitude were 
then determined, as well as the minimal level reached, 
for subsequent model adjustment. 

We also tried fitting each luminescence trace to 
p(t)F(t) + b where b represents a possible constant bias. 
Surprisingly, the fit was consistently better when b was 
equal to the minimum luminescence level. In other words, 
the Fourier fitting procedure tended to remove the floor 
level. Because this could indicate the existence of a bias 
in the luminescence level, we examined more closely the 
raw data, and found that indeed the zero luminescence 
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FIG. 2: Reconstruction of protein target profiles as func- 
tions of Zeitgeber Time (ZT) describing phases within the 
dark/light cycle, with time ZTO corresponds to dawn and time 
ZT12 to dusk, (a) Luminescence time series (dashed lines) for 
individual wells are fitted by the product of a Fourier series 
of period 24 hours with 5 harmonics by a slowly varying poly- 
nomial function of degree 4 (solid lines). The slowly varying 
envelope is also shown with a dotted line. From top to bot- 
tom, the first (last) two panels correspond to two TOCl:luc 
(CCAl:luc) translational fusion lines with different insertions 
in the genome. The different Fourier series are normalized to 
have the same maximum and are averaged, (b) The average 
Fourier series (dashed lines, black: TOC1, grey: CCA1) pro- 
vides a smooth approximation to the cloud of individual line 
time profiles renormalized using the slowly varying polyno- 
mial function and wrapped around 24 hours (thin solid lines) . 
Large dots indicate passages through levels corresponding to 
20% and 80% of maximum amplitude and serve as target 
points for adjustment. 



level does not correspond to the zero protein level. 

In Fig. [31 we show two individual traces monitoring lu- 
minescence intensities over time in two wells of the lumi- 
nometer plate. These two wells contain genetically iden- 
tical cell cultures monitored simultaneously in the same 
experiment. Besides an excellent overall reproducibility, 



the comparison of the two traces reveals the existence of 
a significant bias. Assuming that identical clocks run in 
different cells, and that there is a well-defined average 
number of fusion proteins per cell, the luminescence sig- 
nals should be proportional to numbers of cells in each 
well. Therefore, the two signals should be approximately 
proportional to each other. 

Contrary to this, we observe that both for TOCl:luc 
and CCALluc, the maxima of the two signals differ by 
a large factor, which remains roughly constant over time 
while minima almost coincide. If the two traces were 
proportional to each other, the minima should be in the 
same ratio as the maxima. The only simple explanation 
for the systematic coincidence of minima is that they 
correspond to a zero protein level, and that there is the 
same bias on the two time series. This bias can therefore 
be removed by substracting the two traces, as is shown 
in Fig. |31 where the difference curves provide a better 
estimation of the actual protein profiles than the original 
traces. Importantly, the CCA1 protein level is predicted 
to touch zero very shortly near ZT9, while the TOCf 
protein level appears to stay near zero for a large interval 
of time between ZT21 and ZT8 (Fig. [3]). Note that this 
bias is not constant in time but varies slowly so that 
the different minima correspond to different luminescence 
levels. 




FIG. 3: Translational fusion lines data recorded under 12: f 2 
LD alternation. Time zero corresponds to dawn. On top 
panel (resp., bottom panel) the time evolution of photon 
count (in relative luminescence units) of CCAf (resp., TOCf) 
translational fusion lines are drawn for two biological dupli- 
cates monitored at the same time in the same conditions 
(black and grey thin solid lines). Their difference is plotted 
as a thick black solid line. 



While the existence of an experimental bias in the lu- 
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minescence level is strongly supported by the observa- 
tions above, we have currently no explanation for it. In 
the following, we will therefore adjust two versions of the 
experimental profiles : (i) the Fourier series F(t) deter- 
mined directly from fitting p(t)F(t) to the time series, 
as described above, as if there was no bias, and (ii) the 
same Fourier series with the floor level removed so that 
they touch zero, as with the difference curves in Fig. [3] 
More precisely, the target curves are then F(t) — Fq 
where the F(t) are obtained from fitting p{t)F(t) and 
F = min t F(t). This will allow us to assess the influence 
of the bias on model fitting. Again, using a more so- 
phisticated model of the floor level bias could only lead 
to a better adjustment, which will not be needed, and 
would only make sense with a better understanding of 
the physical origin of the bias. 

D. Measuring goodness of fit by passage times 

Adjusting experimental data to a mathematical model 
requires a score function quantifying the discrepancy be- 
tween experimental and simulated profiles, to be mini- 
mized over parameter space. One difficulty is that mi- 
croarray and luminescent reporter data are very different 
in nature and in particular have different uncertainties so 
that it does make little sense to combine their adjustment 
errors. Therefore, we used passage times as a unifying 
measure of fitting errors, determining for each temporal 
profile the times of passage at 20% and 80% of the in- 
terval between the minimal and the maximal values. We 
then compared passage times for the experimental and 
numerical profiles, with the root mean square timing er- 
ror serving as a goodness of fit. Another advantage of 
passage times is that they have a clear biological signifi- 
cance, with crossings of the 20% level bracketing the time 
interval with expression significantly above background 
level and the crossings with the 80% level bracketing the 
expression peak. 

We denote by A^. and A^ (resp., A^. and A^) 
the time passage error in minutes for the concentration of 
species X (X = M c , Pc, M T , P T ) at 20% (resp., 80%) of 
the interval between minimal and maximal values when 
increasing and decreasing. For each species X, one can 
therefore define a quadratic error function equal to: 

Err(X) = (A^ ot ) 2 + (Af 0f ) 2 + (Af 0i ) 2 + (A*,,) 2 (2) 

The first, and most natural, score function is a RMS 
error combining error for all species (mRNA and pro- 
teins) : 

Smp = M E Err W ( 3 ) 

V XeM c ,Pc,M T ,P T 

In some cases, profiles with similar passage times differ 
by their floor level. We therefore introduce a floor level 

min(x) min(X*) , V> • ■ , 

error <3>x = twt where A is tnc numerical 

max A max ( A ) 



profile for species X and X* is the experimental target 
for X. We choose a ponderation such that a floor level 
error $x = 0.05 is equivalent to a passage time error of 
60 minutes. The score function of this second scheme is 



Smpf = -== Jm(s MP ) 2 + V ($x/$o) 2 (4) 
V18 V xgPcPt 

where $ = 0.05/60 = I/I200 is the relative floor level 
error considered equivalent to one minute. 

Finally, microarray and luminescence data were not 
only adjusted simultaneously but also separately in order 
to assess the coherence between the two types of data. To 
this end, we use the following score functions: 



S M = H £ Err(X) (5) 

V X£M C ,M T 




where the &x error term is scaled to 2$o so that it has 
the same relative importance in the total RMS error as 
in Eq. g]). 

III. RESULTS: ADJUSTMENT OF MODEL TO 
RNA AND PROTEIN DATA 

Figure 2] shows the numerical solutions of model ([T]) 
best adjusting the experimental data according to the 
score functions defined in Sec. Ill D[ both without taking 
into account the floor level bias (Figs. H)Ja)-(d)) or with 
floor levels removed (Figs. Hte)-(h)). The corresponding 
parameter values are given in Table HI 

A first remark is that although a set of four passage 
times is a very crude description of a temporal profile, it 
turns out that numerical solutions and experimental pro- 
files that have close passage times generally follow each 
other closely all over the day. This is consistent with 
the hypothesis that experimental data are well described 
by a low-dimensional ODE set. Secondly, the quality of 
the adjustment is globally very good given the simplicity 
of the model. Clearly, Fig. @] shows that Eqs. (P) can 
adjust simultaneously the protein profiles reconstructed 
from luminescence time series and the RNA profiles re- 
constructed from microarray data (score functions Smp 
and Smpf)- Especially, removing the protein profile 
floor levels so as to correct for the detected experimen- 
tal bias leads to impressive agreement between numerical 
and target curves, which almost superimpose onto each 
other. 

It is interesting to look more closely at the main dis- 
crepancies in Figs IUa)-(d). When only passage times 
are taken into account (score function Smp), the nonzero 
floor level of the TOC1 protein is not reproduced. This is- 
sue can be addressed by using a score function that takes 
into account floor level error (score function Smpf)- In 
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3 6 9 121518 2124 3 6 9 12 1518 2124 
Time (h) Time (h) 

FIG. 4: (Color figure) Adjustement of numerical solutions of the free-running model with FRP equal to 24 h (solid lines) to 
target curves (dash-dotted lines) using various score functions. Panels (a)-(d) in left column show adjustment of mRNA and 
protein profiles without bias correction, using score functions Smp (black solid line), Smpf (dark grey - blue thin solid line) 
and Sp (light grey or red thin solid line). See text for the definition of score functions. Panels (e)-(h) in right column show 
adjustment of mRNA and protein profiles with protein floor level removed, using score functions Smp (black solid line), Sp 
(light grey - red thin solid line) and Sm (dark grey or blue thin solid line), (a), (e): CCA1 mRNA; (b), (f) CCA1 protein; (c), 
(g) TOC1 mRNA; (d), (h) TOC1 protein. 



this case, the floor level agreement is improved, at the 
cost of slightly advancing the protein peak and of in- 
ducing a small floor level for TO CI mRNA. Although 
small, the latter is definitely much higher than the value 
that can be accurately determined from microarray data 
(which estimate the logarithm of mRNA concentration) . 
Last, the numerical solution obtained by adjusting the 
protein profiles alone (score function Sp) predicts mRNA 
profiles rather poorly (Fig. 0]), which could suggest that 
the two types of data are inconsistent. 

To an uninformed mind, these minor discrepancies 
would most probably suggest shortcomings of model (pj , 
not surprisingly given its caricatural simplicity, or the 
fact that microarray and luminescence data have been 
recorded in different experiments. However, these dis- 



crepancies are clearly linked to errors in floor level, which 
are naturally explained by the protein floor level bias dis- 
cussed in Sec. Ill CI and evidenced in Fig. [3] 

Let us now consider more closely adjustment of 
model (HJ) to protein target profiles with floor levels 
removed, together with RNA target profiles, shown in 
Figs. IUe)-(h). Goodness of fit is clearly improved com- 
pared to Figs IUa)-(d), where it was already good. The 
simultaneous adjustment of the four target curves (score 
function Smp) is achieved with an accuracy rarely ob- 
tained for a genetic circuit, with all numerical curves su- 
perimposing perfectly on the experimental ones, except 
that the numerical CCA1 protein profile rises slightly 
more slowly than the experimental one between dusk and 
CCA1 expression peak. It is quite striking to note that 
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TABLE I: Model parameter values. Optimal parameter values for adjustement of model to data using various score 
functions, assuming a free-running period of 24 hours. Parameters are rescaled so that the maximum value of protein profiles 
is 100 nM, the maximum value of CCAl (resp., TOC1) mRNA profile is 10 nM (resp., 70 nM). The TO CI and CCAl mRNA 
maximum values are chosen in the same proportion as in microarray data. The third row of the table indicates whether the 
floor levels of luminescence data are removed (R) or not (NR). The last part of the table gives the degradation rate Dx at the 
mean value X = (max(X) + min(X)) /2 ; D x = 8xKx / [Kx + X) for each species. 
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the two curves begin to separate precisely in the time 
interval where a transient window of CCAl stabilization 
was predicted to occur in Ref. [2|]. Remarkably, numeri- 
cal profiles obtained by adjusting RNA data alone (score 
function Sm) reproduce all experimental data very well 
also. When protein data are adjusted separately (score 
function Sp), RNA profiles are much better reproduced 
than when the protein floor is not removed, with only 
the CCAl mRNA profile showing noticeable discrepan- 
cies in certain parts of the day, but with the global shape 
of the profile preserved. Thus we can conclude that re- 
moving floor level significantly improves adjustment and 
restores coherence between microarray and luminescence 
data. Given the simplicity of the model, it is very un- 
likely that this may have occurred by chance. As we 
will discuss in the next section, it is moreover a remark- 
able finding that experimental data recorded with differ- 
ent techniques and in different experiments (T), [TEj can 
be adjusted simultaneously with such high accuracy by 
a minimal model. This certainly reveals an important 
property of the clock oscillator. 

The parameter sets in the last three columns of Ta- 
ble U which correspond to profiles with protein floors 
removed, are generally more consistent between them- 
selves than those in the first three columns, where the 
bias has not been corrected, and where important vari- 



ations can be seen (compare for example At across the 
first three columns). This reflects the fact that when the 
floor level is not removed, it is more difficult to predict 
mRNA profiles from adjusting protein profiles alone and 
vice versa. To make meaningful comparisons, it should 
however be kept in mind that the significant quantities 
are often not parameters themselves but combinations of 
them. When degradation is saturated, for example, the 
relevant quantity is the product 8xKx, and it may hap- 
pen that Sx and Kx fluctuate more between columns 
than their product. Therefore, we give at the bottom of 
Table U the effective degradation rates at a mean value 
of the concentration Dx , which appear to be much more 
consistent than the individual degradation parameters. 
Another example is that although the values of [it in the 
fourth and fifth columns are quite different, the mini- 
mum values taken by TOC1 transcription rate in both 
cases are in fact comparable (1.78 vs 1.45 nM/h), and 
result from different combinations of At , Pc j an d of the 
minimal value reached by the Pc profile, which is very 
close to the repression threshold Pc ■ This illustrates the 
general fact that although collective fitting can provide 
well-constrained predictions (here the time profiles), in- 
dividual parameters may be poorly constrained, a feature 
observed in many systems biology models [42| . 

Some key ingredients of the dynamics can still be ex- 
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tracted unambiguously from a careful examination of Ta- 
ble m The first are the strongly saturated degradations of 
Mc and Pt, characterized by values of Km c an d Kp T so 
small that the concentrations of the respective molecular 
actors are well above them for almost all of the diurnal 
cycle. The signature of this behavior in the time pro- 
files is a straight-line decay after the expression peak. A 
natural question is whether what appears as saturated 
degradation of a given actor may in fact result from an 
interaction with another actor not yet identified. In the 
case of TOC1, a post-transcriptional regulation actin g af - 
ter dusk has indeed been evidenced experimentally [37| . 
Other key features are a maximum transcription rate sig- 
nificantly higher for TO CI than for CCAl and a small 
threshold of repression Pc , comparable to the minimum 
value of the CCAl protein level. These are related since 
the latter implies that TOC1 is repressed most of the 
time except in a very narrow time interval around the 
minimum of CCAl protein level (this correlates with the 
narrow TOC1 mRNA peak). This must be compensated 
for by a high TOC1 transcription rate. Interestingly, 
the small value for Pc can account for the experimen- 
tally observed relative inefficiency of antisense strategies 
against CCAl |l|. If CCAl level is reduced, even signif- 
icantly, the small time interval where TO CI is not re- 
pressed will extend only slightly, with the dynamical be- 
havior at other times being mostly unchanged. Thus the 
global perturbation will remain limited and arrhythmia 
will be difficult to induce. In contrast to this, the acti- 
vation threshold Pr above which TOC1 activates CCAl 
is relatively low so that activation occurs over a rather 
large time interval of more than 10 hours. Therefore, 
modifications of the activation threshold or TOC1 pro- 
tein level will influence the dynamics for a long time and 
thus easily disrupt oscillations. This also explains why 
FRP is much more sensitive to overexpression of TOC1 
induced by TOCl:luc insertion than of CCAl induced by 
CCAlduc 0. 

Finally, we carried out data adjustment assuming free- 
running periods of 23.8 and 25 hours, to verify that our 
results do not depend critically on FRP (Fig. [SJ parame- 
ters given in Table HI)) . In this case, we allowed day /night 
modulation of some parameters to ensure frequency lock- 
ing with the diurnal cycle. The agreement remains very 
good, although it degrades noticeably compared to the 
case of a FRP of 24 hours. In particular, a phase shift 
of the TO CI mRNA profile is induced. The parameter 
modulation remains very small (for a FRP of 25 hours, 
the repression threshold is modulated but remains be- 
low the minimum of the CCAl temporal profile, so that 
the effect of the change is minimal) . We furthermore ob- 
served that when the FRP was a freely adjustable param- 
eter, it would systematically converge towards a value of 
24 hours. All this confirms that there is no day/night pa- 
rameter modulation in the TOC1-CCA1 feedback loop 
and that coupling can only occur in specific time windows 
as proposed in [2j. 



< 

fl 0.8 



< 
o 
o 



0.4 







-f-^ FRP=25 

\ a ) FRP=23.8_ 



J L 



J L 



20.8 

Q_ 

<0.4 

o 

o 



< 

[20.8 



o 
o 



0.4 







20.8 

Q_ 

G0.4 
o 







\ (b) 




V 


*/ 




// 






1 1 


1 1 1 






(c) /J| 








7 




J 




1 — j*/ i 






(d) / 
















. -1- . x-^J 





3 6 9 121518 21 24 
Time (h) 



FIG. 5: (Color figure) Adjustement of numerical solutions of 
the light-dependent model (solid lines) to target curves with 
floor levels removed (dash-dotted line) using the Smp score 
function. Adjustment of CCAl mRNA (a), CCAl protein 
(b), TOC1 mRNA (c), TOC1 protein (d) without the floor 
level of proteins. The FRP of the autonomous oscillator is ei- 
ther of 25 h (black solid line) and 23.8 h (light grey or red solid 
line). Synchronization is obtained for the 25 h (resp., 23.8 h) 
FRP model by assuming that the parameter Pco (resp., 8p c ) 
take a different value on the day and night. 



IV. DISCUSSION AND CONCLUSION 

In this work, we have carried out a careful and de- 
tailed analysis of time series characterizing temporal vari- 
ations of expression of the two central genes of Ostre- 
ococcus circadian clock, TOC1 and CCAl [J. These 
time series had been previously obtained from microar- 
ray data and luminescent reporter data recorded in two 
different experiments [H, HH. From these time series, we 
have extracted periodic temporal profiles for the RNA 
and protein concentrations of the two genes, assumed 
to represent the mean circadian oscillations in individual 
cells so as to optimize comparison with numerical profiles, 
which are inherently periodic. In particular, approximat- 
ing the periodic component of luminescence time series 
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TABLE II: Model parameter values. Optimal parameter 
values for adjustement of model to data using various score 
functions and assuming a free-running period of 25 or 23.8 
hours. Parameters are rescaled as in Table U 
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by a Fourier series allowed us to separate the dynam- 
ics at work throughout the day from fast transient pro- 
cesses activated only near light/dark and dark/light tran- 
sitions, and probably involved in occasional resetting of 
the clock (compare raw data and target profile in Fig. rjj . 
The data processing also allowed us to detect a bias in 
the luminescence time series, which was confirmed by a 
direct comparison of individual signals from two genet- 
ically identical cell cultures (they were not proportional 
to each other). This bias manifests itself in a nonzero lu- 
minescence floor level. Taking it into account allowed us 
to show that both protein levels approach zero at some 
times of the day, which was essential for the quality of 
the subsequent model adjustment. This illustrates how 
important it is to exploit the redundancy of data by veri- 
fying that data that should provide the same information 
are indeed consistent. 

Thanks to the careful data processing, we could evi- 
dence an extraordinarily good agreement between a min- 
imal model of a two-gene transcriptional feedback loop 
and the reconstructed concentration profiles. This model 
describes activation of CCAl by TOC1 and repression 
of TOC1 by CCAl. It describes only regulated tran- 
scription of the two genes, translation and degradation, 
and comprises only differential equations, from which the 
time evolution of the two mRNA and the two proteins 
can be computed and compared to the target profiles. 
Therefore a biochemically detailed model taking into ac- 
count compartmentalization or genomic insertions would 
have only served to adjust details without biological rele- 
vance, and could well have masked one of the main results 



which is the good adjustment by a free-running oscillator 
model. It is also quite remarkable that using only four 
points of the reconstructed profiles sufficed to constrain 
the numerical profiles to follow their targets throughout 
the day. Here also, adding more points would have only 
forced the model to adjust to unrelevant details without 
gaining information. This suggests that the complexity 
of the model and the constraining data should be care- 
fully matched. 

The excellent agreement found between the model and 
the data supports unambiguously the existence of a two- 
gene oscillator at the core of Ostreococcus circadian clock. 
This not only builds a solid foundation on which future 
studies of this clock can rely but also provides what we 
believe is one of the clearest examples of a natural few- 
gene oscillator evidenced from experimental data. This 
is all the more important as the role of this oscillator in 
the circadian clock constrains it to be extremely robust 
to all kinds of fluctuations. Understanding the dynam- 
ical ingredients besides transcriptional regulation that 
underlie its robustness will certainly be of high inter- 
est for the study of genetic oscillators in general. Two 
remarkable features of the TOC1-CCA1 oscillator have 
indeed emerged from our analysis. First, a strongly satu- 
rated degradation has been evidenced both for the CCAl 
mRNA (already noted in our previous work Q) and the 
TOC1 protein (detected in this work), which manifests 
itself by a straight-line decay at high concentrations after 
the expression peak. This behavior may be the signature 
of post-transcriptional and post-translation interactions, 
and is thus c omp atible with the experimental observa- 
tions of Ref. [33]. It also supports the putative role of 
saturated degradation mechanisms as an efficient mech- 
anism to introduce effective delays along a negative feed- 
back loop [l3, 143rl45{ and the recent observation that this 
is a key ingredient to generate robust oscillations [8|, |46| . 
The second remarkable feature predicted by the model is 
that the TOC1 gene is repressed by CCAl during most of 
the day, except during a short time interval located one 
or two hours before dusk, which is consistent with the 
very narrow peak of TOC1 mRNA expression observed 
in the experimental data. The small duration of TO CI 
expression is compensated by a high transcription rate. 
One may wonder whether this design has an influence on 
the robustness to molecular fluctuations. In any case, it 
explains why functional studies of the oscillator showed 
that it was much more sensitive to pcrtubations in the 
TOC1 level than in the CCAl level [J. 

It has been recently been proposed that circadian 
clocks should not only be robust to fluctuations in 
molecule numbers fvA li~8j and to temperature varia- 
tions [l9|, [2(| but also to fluctuations in the daylight in- 
tensity pattern, which is crucial to synchronize the cir- 
cadian clock to the day/night cycle @, [HJ]. Our results 
demonstrate such robustness for the TOCl-CCAl os- 
cillator in two different ways. First, the experimental 
data are reproduced accurately by a free-running oscil- 
lator model. This had already been noted by Thommen 
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et al. [2J, but the confirmation of this behavior using 
luminescence signals recorded every hour reinforces its 
plausibility significantly. As shown in [2j, it can only be 
explained by assuming that coupling to light is sched- 
uled so as to be active precisely when the oscillator does 
not respond to external perturbations. In this scheme, 
the oscillator is not affected by light in normal entrain- 
ment conditions, which naturally makes it blind to light 
fluctuations, and thus behaves as if it was free-running. 
When the oscillator drifts out of phase, light sensing oc- 
curs at a different time of its cycle, where it responds so 
as to recover the normal entrainment phase. The acute 
responses to light/dark and dark/light transitions that 
occur transiently in raw signals (Fig. [2]) may be corre- 
spond to such resetting. 

The second strong evidence for the robustness of the 
TOC1-CCA1 oscillator comes from the fact that the pro- 
files adjusted precisely and simultaneously by our simple 
model are reconstructed from time series recorded in dif- 
ferent experiments by two different techniques, with to- 
tally different setups and in particular under very differ- 
ent lighting conditions. This strongly suggests that Os- 
treococcus clock is able to tick exactly in the same way in 
different conditions, and in particular in different levels of 
light, which is an obvious requisite for robustness to day- 
light fluctuations. If the clock can cope with randomly 
varying daylight profiles without being perturbed, it can 
certainly accomodate constantly low or constantly high 
daylight levels and deliver similar biochemical signals in 
both cases. The fact that the model adjusted from RNA 
profiles predicts protein profiles and vice versa (although 
with less precision in the latter case) clearly illustrates 
the consistency between the two experiments. Again, we 
stress that this remarkable finding may have remained 
masked without our careful data processing. 

It is important to mention that both the presence of 
saturated degradation kinetics and the absence of effec- 
tive coupling to light when entrained are strong predic- 
tions that can be tested experimentally. 

While the impressive agreement between model and 
experiments obtained here clearly shows that a TOC1- 
CCAl oscillator underlies Ostreococcus clock, it should 



not make us forget that it is only a part of it. The free- 
running average behavior in entrainment conditions, the 
acute responses at day /night and night/day transitions 
can only be explained if the TOCl-CCAl loop interacts 
with other molecular loops and feedback loops. Indeed 
it was shown in [2j that robustness to daylight fluctua- 
tions involves a precisely timed coupling activation. This 
necessarily requires additional feedback loops designed 
so as to generate the signal that will drive optimal cou- 
pling. Furthermore, interactions with other molecular 
actors may well be responsible for the saturated degra- 
dation of CCA1 mRNA and TOC1 protein. 

We thus have now to identify these additional feedback 
loops as well as the light input pathways, which should 
help us to understand how the clock can synchronize in 
spite of fluctuations to day /night cycles of variable dura- 
tion across the year. Adapting to different photoperiods 
and to different light levels are indeed unrelated evolu- 
tionary goals that must be simultaneously satisfied. In 
the end, we hope to build an accurate and comprehen- 
sive model of a simple and robust circadian clock. If the 
agreement between theory and experiment remains com- 
parable to what was achieved in the present study, this 
may well provide us with new and deep insights into the 
design and function of circadian clocks. More generally, 
we believe our results promote Ostreococcus, whose low 
genomic redundancy [36[ is probably crucial for allowing 
accurate quantitative approaches, as a very promising 
model for systems biology. 
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